427 research outputs found
NBLDA: Negative Binomial Linear Discriminant Analysis for RNA-Seq Data
RNA-sequencing (RNA-Seq) has become a powerful technology to characterize
gene expression profiles because it is more accurate and comprehensive than
microarrays. Although statistical methods that have been developed for
microarray data can be applied to RNA-Seq data, they are not ideal due to the
discrete nature of RNA-Seq data. The Poisson distribution and negative binomial
distribution are commonly used to model count data. Recently, Witten (2011)
proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson
assumption may not be as appropriate as negative binomial distribution when
biological replicates are available and in the presence of overdispersion
(i.e., when the variance is larger than the mean). However, it is more
complicated to model negative binomial variables because they involve a
dispersion parameter that needs to be estimated. In this paper, we propose a
negative binomial linear discriminant analysis for RNA-Seq data. By Bayes'
rule, we construct the classifier by fitting a negative binomial model, and
propose some plug-in rules to estimate the unknown parameters in the
classifier. The relationship between the negative binomial classifier and the
Poisson classifier is explored, with a numerical investigation of the impact of
dispersion on the discriminant score. Simulation results show the superiority
of our proposed method. We also analyze four real RNA-Seq data sets to
demonstrate the advantage of our method in real-world applications
The Extraction of Trajectories from Real Texts Based on Linear Classification
Proceedings of the 16th Nordic Conference
of Computational Linguistics NODALIDA-2007.
Editors: Joakim Nivre, Heiki-Jaan Kaalep, Kadri Muischnek and Mare Koit.
University of Tartu, Tartu, 2007.
ISBN 978-9985-4-0513-0 (online)
ISBN 978-9985-4-0514-7 (CD-ROM)
pp. 121-127
Document-Level Relation Extraction with Reconstruction
In document-level relation extraction (DocRE), graph structure is generally
used to encode relation information in the input document to classify the
relation category between each entity pair, and has greatly advanced the DocRE
task over the past several years. However, the learned graph representation
universally models relation information between all entity pairs regardless of
whether there are relationships between these entity pairs. Thus, those entity
pairs without relationships disperse the attention of the encoder-classifier
DocRE for ones with relationships, which may further hind the improvement of
DocRE. To alleviate this issue, we propose a novel
encoder-classifier-reconstructor model for DocRE. The reconstructor manages to
reconstruct the ground-truth path dependencies from the graph representation,
to ensure that the proposed DocRE model pays more attention to encode entity
pairs with relationships in the training. Furthermore, the reconstructor is
regarded as a relationship indicator to assist relation classification in the
inference, which can further improve the performance of DocRE model.
Experimental results on a large-scale DocRE dataset show that the proposed
model can significantly improve the accuracy of relation extraction on a strong
heterogeneous graph-based baseline.Comment: 9 pages, 5 figures, 6 tables. Accepted by AAAI 2021 (Long Paper
SVIT: Scaling up Visual Instruction Tuning
Thanks to the emerging of foundation models, the large language and vision
models are integrated to acquire the multimodal ability of visual captioning,
dialogue, question answering, etc. Although existing multimodal models present
impressive performance of visual understanding and reasoning, their limits are
still largely under-explored due to the scarcity of high-quality instruction
tuning data. To push the limits of multimodal capability, we Sale up Visual
Instruction Tuning (SVIT) by constructing a dataset of 3.2 million visual
instruction tuning data including 1.6M conversation question-answer (QA) pairs
and 1.6M complex reasoning QA pairs and 106K detailed image descriptions.
Besides the volume, the proposed dataset is also featured by the high quality
and rich diversity, which is generated by prompting GPT-4 with the abundant
manual annotations of images. We empirically verify that training multimodal
models on SVIT can significantly improve the multimodal performance in terms of
visual perception, reasoning and planing
- …